Initial Look at Data

Dataset

Data Summary

Categorical Variables in Bar Charts

Numeric Variables in Histograms

Time Series Variables

Outliers and Notes

Categorial Variable Notes

A closer look at dna_visittrafficsubtype shows that many of the subtypes are rarely found in this dataset.

dna_visittrafficsubtype Count Percent
NA 62849 24.9458209
direct core homepage 39719 15.7651364
paid search – dna brand 20905 8.2975447
Email Campaigns 20081 7.9704853
Paid Search Non Brand 13400 5.3186845
internal referrals 12296 4.8804884
direct non-homepage 12288 4.8773130
paid search – core brand 10425 4.1378571
organic dna brand 9205 3.6536187
Email Programs 8642 3.4301546
direct dna homepage 7610 3.0205365
email no source id 6693 2.6565638
organic core brand 6341 2.5168491
External Paid Media 4911 1.9492582
Affiliate External 4066 1.6138635
organic nonbrand 3569 1.4165959
geo-redirect 1536 0.6096641
iOS App 1534 0.6088703
Paid Search GDN 1348 0.5350438
content marketing 1114 0.4421653
social media organic 1101 0.4370053
external referrals 807 0.3203118
Direct Mail 355 0.1409054
Social 269 0.1067706
Web Property 167 0.0662851
Partners 155 0.0615221
Search 108 0.0428670
Radio Brand/PR 92 0.0365163
Android App 67 0.0265934
Digital Video 63 0.0250058
Direct 37 0.0146859
FindAGrave 37 0.0146859
Social Media Natural 30 0.0119075
Telemarketing Other (short term 8/31/05) 30 0.0119075
Windows App 26 0.0103198
Inbound 13 0.0051599
External Email 12 0.0047630
Display 9 0.0035723
DNA App 9 0.0035723
FTM Software Integration 6 0.0023815
TV Brand/PR 5 0.0019846
Feeders 3 0.0011908
Mobile 3 0.0011908
Biz Dev 2 0.0007938
Overlays 2 0.0007938
Kiosk 1 0.0003969
Library/Assoc. 1 0.0003969

Time Series Variables (improved)

After removing the outlier dates (noted above) for ordercreatedate we can better see the general trend.

After removing the NAs from dnatestactivationdayid we can better see the general trend.